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[l] In this paper we continue the community-wide rigorous modern space weather model validation 
efforts carried out within GEM, CEDAR and SHINE programs. In this particular effort, in coordination 
among the Community Coordinated Modeling Center (CCMC), NOAA Space Weather Prediction 
Center (SWPC), modelers, and science community, we focus on studying the models' capability to 
reproduce observed ground magnetic field fluctuations, which are closely related to geomagnetically 
induced current phenomenon. One of the primary motivations of the work is to support NOAA SWPC 
in their selection of the next numerical model that will be transitioned into operations. Six 
geomagnetic events and 12 geomagnetic observatories were selected for validation. While modeled 
and observed magnetic field time series are available for all 12 stations, the primary metrics analysis is 
based on six stations that were selected to represent the high-latitude and mid-latitude locations. 
Events-based analysis and the corresponding contingency tables were built for each event and each 
station. The elements in the contingency table were then used to calculate Probability of Detection 
(POD), Probability of False Detection (POFD) and Heidke Skill Score (HSS) for rigorous quantification 
of the models' performance. In this paper the summary results of the metrics analyses are reported in 
terms of POD, POFD and HSS. More detailed analyses can be carried out using the event by event 
contingency tables provided as an online appendix. An online interface built at CCMC and described 
in the supporting information is also available for more detailed time series analyses. 
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1. Introduction 

[ 2 ] The geomagnetically induced current (GIC) problem 
[e.g., Boteler et al., 1998; Pirjola, 2005] has received elevated 
international interest over the past 3-4 years, especially 
in terms of the potential impact on high-voltage power 
transmission systems. The current worst-case scenarios 
range from wide-scale voltage and system collapses [North 
American Electric Reliability Corporation, 2012] to catas- 
trophic loss of a large number of high-voltage power 
transformers [National Research Council, 2008]. While bet- 
ter quantification of the hazard will require additional 
interdisciplinary science and power engineering inves- 
tigations, it is commonly accepted that the problem is 
serious enough that actions need to be taken for mitigat- 
ing the impact. Consequently, the space weather modeling 
and forecasting community is responding to this ele- 
vated need by supporting the operational utilization of 
the latest advancements in science. More specifically, the 
community needs to work on new regional or even local 
predictions of the geomagnetic environment pertaining 
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Table 1. Geospace Events Studied in the Validation Activity 3 


Event # 

Date and Time 

Min (Dst) 

Max (Kp) 

1 

29 October 2003 06:00 UT-30 October 06:00 UT 

-353 nT 

9 

2 

14 December 2006 12:00 UT-16 December 00:00 UT 

-139 nT 

8 

3 

31 August 2001 00:00 UT-1 September 00:00 UT 

-40 nT 

4 

4 

31 August 2005 10:00 UT-1 September 12:00 UT 

-131 nT 

7 

5 

5 April 2010 00:00 UT-6 April 00:00 UT 

-73 nT 

8- 

6 

5 August 2011 09:00 UT-6 Aug 09:00 UT 

-113 nT 

8- 


a The last two columns give the minimum Dst index and the maximum Kp index of the 
event, respectively. 


to GIC. Initial steps toward this goals have been taken 
both on the empirical and first-principles based modeling 
fronts [e.g., Weigel et al., 2003; Wintoft, 2005; Weimer et al., 
2010; Pulkkinen et al., 2009; Zhang et al., 2012]. The next log- 
ical step is to transition the new scientific capability into 
an operational setting, which is the primary motivation of 
the work at hand. 

[ 3 ] Understanding model capabilities to reproduce 
observed features in the signal of interest is a key ele- 
ment of space weather monitoring and forecasting. Quan- 
tification of the model performance becomes critical as 
one moves from the research to operational environment 
where inaccurate model estimates and misleading error 
bars can potentially lead to poor and costly decisions 
by the end user. Consequently, detailed model valida- 
tion tests for model robustness and general quality checks 
(e.g., model response to bad input data) are a central part 
of model transition to operations and of general interest to 
operational space weather forecasting entities. 

[ 4 ] Acknowledging the importance of rigorous model 
validation and building on the earlier reports on the 
topic [Pulkkinen et al., 2010, 2011; Rastatter et al., 2011], 
as well as the excellent work on geospace model val- 
idation conducted under the auspices of the Geospace 
Environment Modeling (GEM) Metrics and Validation 
Focus Group, NOAA's Space Weather Prediction Center 
(SWPC) requested the Community Coordinated Model- 
ing Center (CCMC) to evaluate geospace models avail- 
able at the CCMC for possible transition to operations. 
This effort included the participation of model develop- 
ers, as well as the CCMC, SWPC, and through GEM, the 
broader scientific community. Planning and discussions 
with modelers and the scientific community were held 
at GEM-sponsored meetings, the annual Space Weather 
Workshop in Boulder, and at meetings of the American 
Geophysical Union. One benefit of building on previous 
work done by the GEM Geospace Environment Modeling 
Challenge is that, over time, we will be able to track model 
improvements, as new and improved versions of existing 
models or new models are delivered to the CCMC. 

[ 5 ] The definition of the validation setting, selection of 
metrics, and the general validation process were discussed 
comprehensively and agreed as the work progressed over 
the past approximate 2 years. All intermediate results 
of the analyses carried out by CCMC were communi- 
cated to the community and modelers, and it was made 
certain that the model installations and tools at CCMC 


were acceptable to all participating groups. Generally, the 
validation process was made as transparent as possible 
including early communication of NOAA SWPC criteria 
for selecting models entering the transition process. 

[6] In contrast to earlier GEM efforts on the topic, the 
focus of the latest model validation effort was to study 
the models' capability to reproduce the observed "dBldt 
events," i.e., rapid fluctuation of the ground magnetic 
field. The primary argumentation for studying dBldt is 
that the time derivative of the ground magnetic field 
(referred to as "dBldt") can be used as an indicator for the 
level of geomagnetically induced electric field or geoelec- 
tric field, on the surface of the Earth [e.g., Viljanen et al., 
2001]. The geoelectric field, in turn, is the primary physi- 
cal quantity driving GIC. Consequently, although numer- 
ous additional complexities such as ground conductivity, 
conductor system configuration, and other engineering 
details including high-voltage power transformer design 
are critical for more detailed assessment of the threat, 
dBldt can be used as an indicator for a potential GIC haz- 
ard. Further, if data from an upstream monitor such as 
NASA's Advanced Composition Explorer (ACE) is used to 
produce dBldt, one can generate short lead time (15-30 
min) forecast estimates of the potential hazard. 

[ 7 ] The structure of the paper is as follows. In section 2 
of the paper, we will describe the setting used in the val- 
idation effort. Section 3 details the metrics used in the 
quantification of the model performance, and in section 4, 
each participating model is summarized. The main results 
of the validation effort are reported in section 5. Finally, 
section 6 provides a brief discussion of our findings. 

2. Validation Setting 

[8] Six geospace storm events listed in Table 1 were cho- 
sen for the study. We note that although the number of 
events may seem small, the amount of effort required 
for analysis of individual events including verification of 
good quality simulations and processing of observational 
data did not allow larger sample size. Four of the events 
(events 1-4 in Table 1) were used in the earlier GEM Chal- 
lenges [ Pulkkinen et al., 2010, 2011; Rastatter et al., 2011] 
and two new "surprise events" not communicated to the 
modelers prior to the model and model setup delivery to 
CCMC were added to the list. The two new events were 
selected jointly by CCMC and NOAA SWPC scientists. 
Solar wind bulk plasma and the interplanetary magnetic 
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Figure 1. Solar wind bulk plasma and the interplanetary magnetic field observations 
(in each panel from top to bottom: plasma density, plasma temperature, x-component of 
the plasma flow velocity, y-component of the plasma flow velocity, z-component of the 
plasma flow velocity, x-component of the magnetic field, y-component of the magnetic field, 
z-component of the magnetic field) for the studied storm events (Figures la-lf correspond- 
ing to events 1-6, respectively) given in Table 1. Data in Geocentric Solar Magnetospheric 
coordinates. See the text for details. 


field observations carried out by Solar Wind Electron, 
Proton, and Alpha Monitor (SWEPAM) and MAG instru- 
ments onboard Advanced Composition Explorer (ACE) for 
the events are shown in Figure 1. Level 2 ACE data were 
used in the analyses. Due to limitations of the SWEPAM 
instrument during the October 2003 event (event 1), only 


low temporal resolution plasma velocity data could be 
constructed [Skoug et at, 2004]. Further, the plasma density 
data for the event were obtained from the Geotail Plasma 
Wave Instrument. Earlier, GEM Challenge events 1 and 
2 are well-known coronal mass ejection-related major 
storm events, and events 3 and 4 are less active periods 
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Table 2. The Locations of the Geomagnetic Observatories Used in the Study 3 


Station Name 

Station Code 

Geomagnetic Latitude 

Geomagnetic Longitude 

Yellowknife 

YKC 

68.9 

299.4 

Meanook 

MEA 

61.6 

306.2 

Newport 

NEW 

54.9 

304.7 

Fresno 

FRN 

43.5 

305.3 

Iqaluit 

IQA 

74.0 

5.2 

Poste-de-la-Baleine 

PBQ 

65.5 

351.8 

Sanikiluaq 

SNK 

66.4 

356.1 

Ottawa 

OTT 

55.6 

355.3 

Fredericksburg 

FRD 

48.4 

353.4 

Ftornsund 

HRN 

73.9 

126.0 

Abisko 

ABK 

66.1 

114.7 

Wingst 

WNG 

54.1 

95.0 

Furstenfeldbruck 

FUR 

48.4 

94.6 


“Bold typeface stations indicate the six stations (stations PBQ and SNK are alternates, see 
the text for details) used in the final analyses discussed in section 5. 


associated with much more subtle changes in the solar 
wind driving. Events 1-4 are from the solar cycle 23. The 
new surprise event 5 is one of the first CME-related events 
of the cycle 24 and was of special interest due to the very 
large substorm event that was associated with the storm. 
Event 6 in turn was the first severe storm of the cycle 24. 

[ 9 ] Solar wind observations were propagated to model 
inflow boundaries by ballistic propagation and the 
x-component (Geocentric Solar Magnetospheric coordi- 
nate system) of the interplanetary magnetic field was set 
to zero. While solar wind propagation constitutes a source 
for modeling errors, identical uncertainties were intro- 
duced for all models in the specification of the inflow 
boundaries. 

[ 10 ] For each event in Table 1, the model performance 
was evaluated by comparing the observed versus pre- 
dicted ground dBldt. Throughout the paper, dBldt is 
defined as follows: 

dBldt = y ( dB,/df)2 + (dB y /df) 2 (1) 

where B x and B y indicate the two horizontal components 
of the magnetic field (geomagnetic dipole coordinates). 
60 s geomagnetic observatory recordings were used to 
provide the observed signal. It is noted that while opti- 
mally higher temporal resolution data such as 1 s (not 
available for the selected stations and events) should be 
used, mapping from dBldt into the geoelectric field driv- 
ing GIC is typically a low-pass filtering operation. Conse- 
quently, 60 s sampling rate of the ground magnetic field 
is able to capture mostly the same features of the sur- 
face geoelectric field variations as that of the 1 s sampling 
rate [Pulkkinen et ah, 2006]. Again, following the earlier 
GEM Challenges, 12 geomagnetic observatories (magne- 
tometer stations) listed in Table 2 and shown in Figure 2 
were selected based on the global spatial and temporal 
coverage. Station PBQ was discontinued November 2007 
and replaced by station SNK. Consequently, for events 5 
and 6, station SNK was used in place of PBQ. One minute 
temporal resolution magnetic field recordings were down- 


loaded via INTERMAGNET (www.intermagnet.org). The 
data were transformed from geographic coordinates, as 
provided by INTERMAGNET, into geomagnetic dipole 
coordinates. IGRF 2000 coefficients were used to com- 
pute the coordinate transformation matrices as given by 
Hapgood [1992]. The quiet-time baseline level was deter- 
mined visually for each station and for each event, and 
the baseline was removed from the magnetic field data to 
obtain the disturbance field. Small data gaps with length 
of no more than few minutes were patched by means 
of linear interpolation. The modeled magnetic field data 
were resampled by means of spline interpolation to match 
the time stamps of the observations. 

[ 11 ] In contrast to the earlier GEM Challenges on 
the topic, modeled ground magnetic field variations 
were computed by integrating over the global model 
electric current sources both in the ionosphere and 
magnetosphere. The integration includes magnetic field 
perturbations generated by all major sources including 
high-latitude ionosphere, magnetopause, magnetotail, 
and the ring current. The CCMC tool used for integra- 
tion was validated, for example, against a comparable 
Space Weather Modeling Framework tool [Yu and Ridley, 
2008] and is described in detail in a companion paper by 
Rastatter et al. (Community-wide geospace model valida- 
tion: Computation of SB at the CCMC, submitted to Space 
Weather, 2013). The two empirical models (see Table 3) 
provided direct predictions of the magnetic field at the 
used station locations. All model runs and ground mag- 
netic field calculations were carried out at CCMC. 

[ 12 ] The ground magnetic field is generally a sum 
of internal (induced) and external components. While 
empirical models take also the internal component implic- 
itly into account, the first-principles models applied here 
do not. The magnitude of the magnetic signal originat- 
ing from the induced currents is dependent on a number 
of factors such as the ground conductivity structure, 
the source structure, and the distance from the source. 
However, as a very rough rule of thumb, about 20-40% 
of the ground magnetic field can be of internal origin 
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60°W 



Figure 2. The locations and the station codes of the geomagnetic observatories used in the 
study. Geomagnetic dipole coordinates are used. Thick and thin circles indicate high-latitude 
and mid-latitude stations, respectively, used in the final analyses in section 5. 


during active periods [Tanskanen et al, 2001; Pulkkinen and 
Engels, 2005]. While ideally also the internal contribution 
should be taken into account, more detailed geomagnetic 
induction calculations require information about the local 
ground conductivity structure not known for the stations 
used in this study. As the majority of the observed dBldt 
(horizontal components) come from the external sources, 
we do not believe exclusion of the internal part of the field 
for fist-principles models affects the central results of this 
paper. 

[ 13 ] Six stations were selected out of the original 12 
GEM Challenge stations to represent the high-latitude 
and mid-latitude locations. The selected high-latitude sta- 
tions are PBQ/SNK, ABK and YKC and mid-latitude sta- 
tions WNG, NEW, OTT (see Table 1 and Figure 2). The 
selected six stations represent all three meridional chains 
used in the earlier Challenges and have equal weight on 
mid- and high-latitude locations. No observed data was 


available for station ABK for event 5. Although all 12 
stations are available for both observation and model pre- 
diction data sets and can be viewed via CCMC's online 
model validation interface, only the above six stations are 
used in the results discussed in section 5. 

3. Selected Metrics 

[ 14 ] Based on the earlier GEM Challenge experiences 
and operational needs in terms of dBldt prediction 
capability, it was agreed that the model validation should 
be built on event-based analyses. An event is defined 
here as follows: within a forecast window 0 < t < tf, 
the absolute value of the parameter of interest exceeds 
an event threshold |x thr es| (here dBldt). The windows are 
moved over the time series in non-overlapping segments, 
and events for the given tf and |x t hres| are recorded for both 
the measured and the modeled x. By comparing threshold 
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Table 3. Models Analyzed in the Validation Effort 3 


Identifier 

(Model Version) Model 

Grid (No. of Cells, Min. Res.) 

2 LFM-MIX 
3_WEIGEL 
4 OPENGGCM 
5_WEIMER 
9 SWMF 

(LTR-2.1.1) LFM coupled with ionospheric electrodynamics 
empirical model 

(OpenGGCM 4.0) global MHD coupled with CTIM 
empirical model 

(SWMF 2011-01-31) BATS-R-US coupled with RIM and RCM 

163,000, 0.4 R e 
N/A 

3.9 million, 0.25 Re 
N/A 

1 million, 0.25 Re 

a Each model is assigned a unique model identifier given by the leftmost column of the table. The table indicates 


the model setting, and if applicable, the number of cells and the minimum spatial resolution used in the global MHD 
part of the mode. See text in section 4 for details. 


crossing for both observed and modeled time series, one 
can then build a four-element matrix known as contin- 
gency table. The table reports the number of correct hits, 
false alarms, missed events, and correct no events [e.g., 
Lopez et al., 2007]. 

[ 15 ] In this work the length of the analysis window tj 
was selected to be 20 min, and the thresholds dBldt 0.3, 
0.7, 1.1 and 1.5 nT/s were used. The selected thresholds 
represent values that both span lower and higher ranges 
of rates of change and are also in the "mid-range" in a 
sense that enough threshold crossing could be detected 
for good statistics. We carried out systematic sensitiv- 
ity analyses to study the impact of the selected forecast 
window length. While predictability of events gets some- 
what poorer with shorter window lengths, the ranking of 
the models did not change significantly as a function of 
the analysis window length (not shown). Consequently, it 
was concluded that varying the analysis window length 
between 10 and 45 min did not change the central 
results notably. 

[ 16 ] The elements of the contingency table contain the 
number of correctly predicted threshold crossings H (hits), 
the number of false alarms F, the number of missed 
crossings M, and the number of correctly predicted no 
crossings N. The set {H,F,M,N} can be used to compute a 
number of different metrics quantifying the performance 
of individual models. In this study, three metrics proposed 
by NOAA SWPC were selected for use in the final anal- 
yses. The selected metrics are Probability of Detection 
(POD), Probability of False Detection (POFD) and Heidke 
Skill Score (HSS). We describe each metric more in detail 
in the following subsections. 


3.2. Probability of False Detection 

[is] POFD is defined for the set {H, F, M, N} as 

POFD = (3) 

F + N 

The metric measures the fraction of correctly predicted 
no crossings that were incorrectly forecast as crossings. 
POFD ranges from 0 to 1, with 0 being a perfect score. 
Similar to POD, a model predicting artificially low-signal 
amplitudes will provide low F and small PODF, and thus, 
the metric should be used in conjunction with POD. 

3.3. Heidke Skill Score 

[ 19 ] HSS is defined for the set {H,F,M,N} as 


HSS = 


2(HN - MF) 

(H + M)(M + N) + (H + F)(F + N) 


(4) 


The metric measures the fraction of correctly predicted 
threshold crossings after eliminating those predictions 
that would be correct purely by random chance. It ranges 
from negative infinity to 1. Negative values indicate that 
random forecast is better than the model prediction, 0 
indicates no skill (as good as random), and 1 indicates a 
perfect score. 

[ 20 ] It is noted that for HSS to be meaningful measure 
of the model performance a variety of states of the system 
should be studied. For example, perfect prediction of no 
1.5 nT/s crossings for a weak event (H = M = F = 0) is 
reported as HSS = 0/0, which is not defined. Consequently, 
to guarantee well-defined HSS, one should be careful to 
pick thresholds that are crossed for the selected sets of 
events. 


3.1. Probability of Detection 

[ 17 ] POD is defined for the set {H, F, M, N} as 


The metric measures the fraction of observed threshold 
crossings which were correctly forecast. It ranges from 0 
to 1 with 1 being a perfect score. Since a model providing 
artificially large signal amplitudes will tend to generate 
large H and large POD, the metric should be used in 
conjunction with POFD defined below. 


4. Models 

[ 21 ] Five models were evaluated in the model validation 
activity. These included empirical models by D. Weimer 
(Virginia Polytechnic Institute) and R. Weigel (George 
Mason University) and major US global magnetohydro- 
dynamic (MHD) models from University of Michigan, 
Center for Integrated Space Weather Modeling (CISM) 
and University of New Hampshire. Also, the Finnish 
Meteorological Institute's Grand Unified Ionosphere- 
Magnetosphere Coupling Simulation (GUMICS) global 
MHD group participated in the discussions associated 
with the validation work. However, while GUMICS is 
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available for runs-on-request at CCMC, due to the cur- 
rent serial implementation of the model, the group did 
not participate in the validation of the model itself. The 
full parallel version of GUMICS is expected to be available 
early 2013. 

[ 22 ] All models that participated in the validation effort 
were delivered to CCMC by January 2011. CCMC had 
extensive communications with the model developers to 
guarantee correct installation and to ensure the usage of 
appropriate settings for each model. Based on a variety of 
tests, such as code robustness carried out at CCMC, model 
developers provided revisions to the model settings. The 
final selection of all model settings was accomplished by 
mid-August 2011. To allow for simulations in a realis- 
tic real-time computational environment, it was required 
that settings for all models were such that the simulations 
would run not slower than twice the modeled physical 
time on 64 Beowulf cluster processors. All simulations 
were performed at CCMC using the same computational 
architecture. 

[ 23 ] Below, each model and settings pertaining to the 
validation activity are described. Table 3 summarizes some 
of the key features of each individual model. A version of 
the Weimer model and all global MHD models discussed 
in this work are available at CCMC for runs on request. 

4.1. Weimer Empirical Ground Magnetic Field 
Prediction Model 

[ 24 ] The empirical model supplied by Daniel Weimer 
provides values of the magnetic perturbations at the 
surface of the Earth, for vector components in the 
North, East, and Vertical (down) directions. Vectors are 
returned as a function of location in either geographic lat- 
itude/longitude coordinates or corrected geomagnetic lat- 
itude/magnetic local time coordinates. More specifically, 
the model internally uses "Modified Apex" coordinates 
[Richmond, 1995; Emmert et al., 2010; VanZandt et ah, 1972]. If 
geographic locations are specified on input, then the out- 
put vectors are also in geographic coordinates such that 
positive North is toward the geographic pole; otherwise. 
North is toward the corrected geomagnetic apex pole. 
For the purpose of the challenge, the output vectors are 
rotated into geomagnetic coordinates. The data needed to 
drive the model are the Geocentric Solar Magnetospheric 
(GSM) y- and z-components of the interplanetary mag- 
netic field (IMF), the solar wind velocity, the dipole tilt 
angle of the Earth's magnetic field, and the solar F10.7 
index. Ideally, 25 min mean values of the IMF and solar 
wind velocity should be used, with a 20 min delay after 
propagation to the magnetosphere's bow shock [Weimer 
and King, 2008]. 

[ 25 ] The model uses Spherical Cap Harmonic Analysis 
(SCHA) [Haines, 1985] within a cap that extends down to 
33.4° apex latitude. SCHA coefficients up to order m = 3 
and degree Z = 16 are used. Measurements of the IMF 
and solar wind on the ACE spacecraft, from February 1998 
through December 2005, were used to generate the model, 
as well as measurements from over 120 magnetometer 


stations. Details of data preparation and initial tests are 
provided by Weimer et ah [2010]. A least error fit was used 
to find how each of the SCFLA coefficients varies as a lin- 
ear function of the input values, with 17 terms for each 
coefficient. In order to handle the nonlinear "saturation" 
response of the ionosphere, the fits are derived sepa- 
rately within 23 bins, divided according to the magnitude 
of the IMF. 

[ 26 ] More recently, an improved version of this model 
has been developed, but it was not available for the val- 
idation work discussed in this paper [Weimer, 2013]. The 
latest version extends down to the geomagnetic equator, 
using data from 143 magnetometer stations. It uses spher- 
ical harmonics up to degree l = 31, and divides the IMF 
measurements into 29 bins. 

4.2. Weigel Empirical Ground Magnetic Field 
Prediction Model 

[ 27 ] Three models were used for this study. All of the 
models were developed using available 1 min ground 
magnetometer measurements from World Data Centre for 
Geomagnetism (Edinburgh) (http://www.wdc.bgs.ac.uk/) 
in the time interval 1 January 2000 through 31 December 
2006, excluding event intervals that occurred in this time 
range. 

[ 28 ] The first model is the climatological average of A B 
(disturbance field) for each component computed by tak- 
ing the average of A B in 48 local times. 

[ 29 ] The second and third models are linear impulse 
response filters using the method of Weigel [2007], Both 
models predict geomagnetic disturbance G (AB or d Bldt) 
in vector direction i using 

Nc 

Gilt, LT) = Zi A ,ir + J2 vB ^ ~ O*#', LT) (5) 

t'= 0 

where the solar wind velocity v, and the rectified -z com- 
ponent of the interplanetary magnetic field, B s = 1/2(|B Z | - 
B z ), are from the OMNI high-resolution data set (http:// 
omniweb.gsfc.nasa.gov/ow$_$min.html), and the h coeffi- 
cients depend on local time, LT. The model coefficients 
are determined using a least-squares minimization of the 
prediction error. 

[ 30 ] The model for G; = AB, has Nc = 12 and pre- 
dicts G; at 48 local times. Physically, this model predicts 
the ground magnetic field given the past 6 h of solar 
wind measurements. Data were resampled to place the 
predictions on a 1 min time grid. 

[ 31 ] The model that predicts G = dB/df has Nc = 4 and 
predicts G at 1440 local times. The model G = dB/df was 
used in the metrics analyses carried out in this paper. 

4.3. Space Weather Modeling Framework 

[ 32 ] The Space Weather Modeling Framework (SWMF) 
[Toth et ah, 2005, 2012] (http://csem.engin.umich.edu/ 
swmf) is a flexible software framework designed to model 
a variety of space physics phenomena. The SWMF divides 
the complex space physics systems into physics domains. 
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The domains used in the SWPC modeling challenge are 
the Global Magnetosphere (GM), the inner magneto- 
sphere (IM), and the Ionosphere Electrodynamics (IE). 

[ 33 ] The GM model is the Block-Adaptive Tree Solar- 
wind Roe-type Upwind Scheme (BATS-R-US) [Powell 
et ah, 1999; Gombosi et al., 2004]. In the work described 
in this paper, the semi-relativistic MHD equations 
[Gombosi et ah, 2002] are solved. We use an explicit/implicit 
time stepping scheme [Toth et ah, 2006] with a 5 s time 
step (potentially reduced by the adaptive time step control 
scheme if necessary). The computational domain extends 
from 32 R E upstream to -224 R E downstream in the x direc- 
tion and ±128 Re in the y and z coordinates (GSM). The 
inner boundary is at 2.5 Re distance from the center of 
the Earth. The domain is discretized with a block-adaptive 
Cartesian grid. The roughly 1 million grid cells vary in size 
from 1/4R £ near the inner boundary to 8 R £ in the distant 
tail. The boundary conditions are the usual(see referred 
publications for details), except for the density at the inner 
boundary, which is set as p inne r = 28 + 0.1 CPCP, where 
CPCP is the average of the northern and southern cross 
polar cap potentials measured in keV, and the density is 
measured in amu/cm 3 . In these runs, we used the artifi- 
cial wind scheme [Sokolov et ah, 2002] with Koren's limiter 
[Korea, 1993] (ft = 1.2) and the eight-wave scheme [Powell 
et ah, 1999], 

[ 34 ] The IM domain is represented by the Rice Convec- 
tion Model (RCM) [Wolf et ah, 1982; Toffoletto et ah, 2003]. 
RCM solves for the bounce averaged and isotropic but 
energy resolved particle distribution of electrons and var- 
ious ions. We used the standard RCM settings except for 
one modification: we added an exponential decay term 
to the RCM equations, so that the phase space density 
decays toward zero with 10 h e-folding rate. With this 
modification the Dst index of the coupled model recovers 
better after large storms. 

[ 35 ] The IE domain is represented by the Ridley Iono- 
sphere Model (RIM) [Ridley et ah, 2004]. RIM uses the 
field-aligned currents obtained from GM and the F10.7 
flux (set as an input parameter for each event) to calculate 
particle precipitation and conductances based on empiri- 
cal relationships. RIM solves a Poisson-type equation for 
the electric potential on a 2-D spherical grid. We set the 
lower latitude boundary to 10°. 

[36] In the work described in this paper, the BATS-R-US 
and RIM models are coupled every 5 s, while the BATS- 
R-US with RCM as well as the RIM to RCM couplings 
are done every 10 s. In the BATS-R-US-RIM coupling, the 
MHD model calculates the field-aligned currents (FAC) at 
3 R £ and maps it down to the ionospheric grid. The elec- 
tric field obtained by RIM, is mapped back to the inner 


boundary of GM, where the E x BIB 2 velocity is calculated. 
The cross polar cap potentials are also sent to GM, and are 
used to set the density at the inner boundary. In the RIM to 
RCM coupling, the electric potential is passed and inter- 
polated onto the RCM grid. In the BATS-R-US to RCM 
coupling, BATS-R-US finds the closed field line region and 
calculates field volume integrals with an efficient paral- 
lel field line tracing algorithm [Glocer et ah, 2009a]. The 
integrated GM density and pressure are applied as outer 
boundary conditions for the IM model assuming a 90% 
H + to 10% 0 + number density ratio. In the RCM to BATS- 
R-US coupling, the GM grid cell centers are traced to 
the inner boundary along the magnetic field lines with 
an efficient parallel algorithm [De Zeeuw et ah, 2004]. The 
BATS-R-US pressure and density are nudged toward the 
RCM values with a 20 s relaxation time. 

[ 37 ] In addition to the basic variables used in the 
various models, the SWMF can also calculate various 
plasma parameters along satellite trajectories, ionospheric 
foot-points of satellites, integrated line-of-sight images, 
various geomagnetic indexes (Dst,Kp), as well as local 
magnetic perturbations [Yu and Ridley, 2008]. The SWMF 
can model space weather events starting from the Sun 
all the way to the Earth [Toth et ah, 2007]. The magne- 
tospheric components of the SWMF have been validated 
in several studies [Ridley et ah, 2002; Zhang et ah, 2007; 
Glocer et ah, 2009c; Welling and Ridley, 2010]. In addition to 
the magnetospheric models used in this study, the SWMF 
also contains the radiation belt and the polar wind com- 
ponents [Glocer et ah, 2009a, 2009b], and the CRCM and 
RAM-SCB inner magnetosphere models [Buzulukova et ah, 
2010; Zaharia et ah, 2010]. 

4.4. Lyon-Fedder-Mobarry Model With 
Magnetosphere-Ionosphere Coupler and 
Electrodynamics Solver 

[ 38 ] The Coupled Magnetosphere Ionosphere Ther- 
mosphere Model (CMIT) [Wang et ah, 2004; Wiltberger 
et ah, 2004] couples the Lyon-Fedder-Mobarry global 
magnetosphere model (LFM) [Lyon et ah, 2004] with the 
Thermosphere-Ionosphere-Electrodynamic Global Circu- 
lation Model (TIEGCM) [Roble and Ridley, 1994] via the 
Magnetosphere Ionosphere Coupler Solver (MIX) [Merkin 
and Lyon, 2010] to provide a comprehensive global simula- 
tion of geospace response to solar and solar wind drivers. 
The LFM portion of the model solves the ideal magne- 
tohydrodynamic equations to describe the interaction of 
the solar wind plasma with the plasma in geospace. This 
portion of the model requires that the solar wind and 
IMF conditions be specified typically from ACE or WIND 
spacecraft observations. These conditions are assumed 


Figure 3. Time series of the observed (blue curves) versus modeled (black curves) dBldt at 
the three high-latitude stations indicated in Table 2 for event 2 indicated in Table 1. The time 
is magnetic local time (MLT) and the dashed lines indicate the dBldt thresholds of 0.3, 0.7, 1.1 
and 1.5 nT/s. Results are shown for the models as follows: (a) 2_LFM-MIX, (b) 3_WEIGEL, (c) 
4_OPENGGCM, (d) 5_WEIMER, and (e) 9_SWMF (see Table 3). 
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Event no. 2, HIGH-LAT station YKC, model: 2_LFM-MIX (black) 
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Event no. 2, MID-LAT station NEW, model: 2_LFM-MIX (black) 
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Figure 5. Probability of Detection (POD) (blue curve) and Probability of False Detection 
(POFD) (black curve) defined in section 3 for the dB/d/ thresholds (a) 0.3 nT/s, (b) 0.7 nT/s, (c) 
1.1 nT/s, and (d) 1.5 nT/s. In Figures 5a-5d, the top panel shows POD and POFD obtained by 
integrating over the three mid-latitude stations, and the bottom panel shows POD and POFD 
obtained by integrating over the three high-latitude stations. The models (see Table 3) are 
ordered according to their POD. The model with the largest POD is the leftmost in all panels. 


to be constant along planar fronts propagating through 
the computational domain. The LFM is electrodynami- 
cally coupled to the ionosphere through the MIX model. 
MIX solves for the cross polar cap potential taking cur- 
rents from the magnetospheric domain and conductance 
from the ionosphere. In order to obtain the conductance 
information, MIX uses a series of empirical relationships 
described in Wiltberger et al. [2009] to transform the MHD 
parameters at the inner boundary into a characteristic 
energy and flux of precipitating electrons. The ionospheric 
component uses the electron flux information along with 
a parameterization of the solar extreme ultraviolet (EUV) 
flux driven by the F10.7 index to compute the conductance. 
It is noted that, in this validation work, TIEGCM was not 
used. Instead of the full ionosphere-thermosphere system, 
only the ionospheric electrodynamics was treated via MIX. 


[ 39 ] For the validation work discussed in this paper, 
we used a modest resolution version of the model that 
provides reliable performance on small amount of com- 
putational resources. In the LFM, the simulation grid was 
53(radial) x 48(azimuthal) x 64(polar) points allowing for a 
typical resolution in the inner magnetosphere of roughly 
0.4 R e . The electrodynamic grid used in MIX was 2° x 2° 
covering the high latitude region down to a magnetic 
colatitudes of 45°. The model runs faster than real time 
on 24 processors. In the magnetosphere, the typical time 
step is approximately 0.1 s. The electrodynamic coupling 
between the ionosphere and magnetosphere is updated 
every 5 s. The computational model can provide a vast 
array of information relevant to space weather ranging 
from the magnetic fields at geosynchronous orbit to the 
ground magnetic field perturbations. 


Figure 4. Time series of the observed (blue curves) versus modeled (black curves) dBldt at 
the three mid-latitude stations indicated in Table 2 for event 2 indicated in Table 1. The time 
is magnetic local time (MLT) and the dashed lines indicate the dBldt thresholds of 0.3, 0.7, 1.1 
and 1.5 nT/s. Results are shown for the models as follows: (a) 2_LFM-MIX, (b) 3_WEIGEL, (c) 
4_OPENGGCM, (d) 5_WEIMER, and (e) 9_SWMF (see Table 3). 
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Figure 6. Heidke Skill Score (HSS) defined in section 3 for the dBldt thresholds (a) 0.3 
nT/s, (b) 0.7 nT/s, (c) 1.1 nT/s, and (d) 1.5 nT/s. In Figures 6a-6d, the top panel shows HSS 
obtained by integrating over the three mid-latitude stations, and the bottom panel shows 
HSS obtained by integrating over the three high-latitude stations. The models (see Table 3) 
are ordered according to their HSS. The model with the largest HSS is the leftmost in all 
panels. 


[ 40 ] CMIT and its component models have been used 
to study a variety of processes in geospace ranging from 
magnetic storms [Goodrich et al, 1998] to substorms [Lopez 
et al, 1998; Wiltberger et al., 2000] including driving by 
CMEs [Baker et al, 2004] and CIRs [Wiltberger et al., 2012]. 
We have validated the model against numerous mea- 
surements of magnetopause crossings [Lopez et al, 2006; 
Garcia and Hughes, 2007], ground magnetometer observa- 
tions [Wiltberger et al, 2003], and climatology data from 
Geotail [Guild et al, 2008a, 2008b]. The version of the 
model used in this study does not include an inner 
magnetosphere model, but coupling with the Rice Con- 
vection Model has recently been completed [Pembroke 
et al, 2012] and will be part of a future release to the 
CCMC. 

4.5. Open General Geospace Circulation Model 

[ 41 ] The Open General Geospace Circulation Model 
(OpenGGCM) global MHD model simulates the inter- 
action of the solar wind with the magnetosphere- 
ionosphere-thermosphere system. Besides numerically 
solving the MHD equations with high spatial resolu- 
tion in a large volume containing the magnetosphere, 
the model also includes ionospheric processes and their 


electrodynamic coupling with the magnetosphere. The 
coupling between the magnetosphere and the ionosphere 
is an essential part of the model because the ionosphere 
controls, to a large extent, magnetospheric convection, by 
providing the resistive closure of the field-aligned cur- 
rents that are generated from the interaction of the solar 
wind with the magnetosphere [Raeder et al, 1996, 1998]. 
Processes that occur in the near-Earth region on polar 
cap and auroral field lines and that are inherently kinetic 
have been parameterized in the model using empirical 
relationships. These processes include the field-aligned 
potential drops that are associated with upward field- 
aligned currents, electron precipitation caused by the 
field-aligned potential drops [Knight, 1972], and the dif- 
fuse electron precipitation that is caused by pitch angle 
scattering of plasma sheet electrons [Lyons et al, 1979; 
Robinson et al, 1987; Weimer et al, 1987; Kennel and Petschek, 
1966]. The electron precipitation parameters and the iono- 
sphere potential are then passed to the Coupled Ther- 
mosphere Ionosphere Model (CTIM), which is coupled 
to the MHD part of the code. CTIM [Fuller-Row ell et al., 
1996] is a dynamic model of the ionosphere and ther- 
mosphere with a long heritage, covering the globe from 
80 km to several 1000 km altitude, and following several 
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Figure 7. Metrics results for dBldt threshold of 0.3 nT/s individually for events 5 and 6. (a) 
POD and POFD for event 5, (b) POD and POFD for event 6, (c) HSS for event 5, and (d) HSS 
for event 6. Compare to Figures 5 and 6. 


neutral and ionic species and their photochemical inter- 
actions. CTIM computes self-consistently the ionospheric 
Pedersen and Hall conductances, which are then used to 
solve the ionospheric potential equation [see Raeder, 2003, 
for details]. 

[ 42 ] The OpenGGCM requires as input solar wind and 
IMF data, and the F10.7 solar radio flux as a proxy for 
solar UV/EUV radiation. Solar wind and IMF data are 
ballistically propagated from the monitor location to the 
upstream boundary of the simulation. Furthermore, we 
calculate the normal direction of the solar wind fronts 
using the minimum variance method [Sonnerup and Cahill, 
1967, 1968] in order to extrapolate the single point mea- 
surements to the entire inflow surface. Without such 
treatment, the IMF B x component cannot change in time 
without violating V • B = 0 [Raeder et al., 2001c]. 

[ 43 ] The OpenGGCM computes all magnetospheric and 
ionospheric quantities that are necessary to determine 
GICs [Raeder et al, 2001a]. Other quantities of interest to 
space weather can also be derived, such as the total and 
equivalent ionospheric current, ionosphere electron con- 
tent, and neutral density affecting low-Earth orbit satel- 
lites [Li et al, 2011]. The model has been used for a variety 
of studies, for example, for the study of substorms [Raeder 
et al, 2001c, 2008, 2010; Ge et al., 2011; Gilson et al., 2012], 
storms [Raeder et ah, 2001a, 2001b], flux transfer events and 


dayside reconnection [Raeder, 2006; Dorelli et ah, 2004; 
Muhlbacher et ah, 2005; Connor et ah, 2012], ionospheric con- 
vection [Vennerstrom et ah, 2005, 2006; Siscoe et ah, 2004; Lu 
et ah, 2011], and plasma entry under northward IMF [Li 
et ah, 2005, 2008, 2009, 2011]. A more detailed description 
of the code and the methods used can be found in Raeder 
[2003] and Raeder et ah [2008]. 

5. Results 

[ 44 ] To demonstrate a typical storm-time situation. 
Figures 3 and 4 show example time series of the observed 
versus modeled dBldt for the event 2 (Table 1). Again, all 
data are viewable via CCMC's online visualization inter- 
face accessible at http://ccmc.gsfc.nasa.gov/challenges/ 
dBdt/metrics_results.php. As is seen from the figures, 
models do not generally capture the dBldt fluctuations 
point-by-point, which is not surprising considering the 
complex waveform of the signal. However, the ampli- 
tudes of high-latitude dBldt fluctuation especially in the 
beginning of the event are reproduced to a degree by the 
models. Although all models miss some of the observed 
activity, for example, at station ABK (Figure 3) around 
26-32 MLT, the capability, at times, to reproduce com- 
parable dBldt amplitudes indicates that the models may 
provide utility in capturing events within given forecast 
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windows. We will quantify this capability to capture the 
events using metrics discussed in section 3. 

[ 45 ] The final metrics-based analyses were carried out 
for each individual model using events and stations 
described in section 2, and the corresponding contingency 
tables with elements { H,F,M,N } were generated for each 
model for each event and station for dBldt thresholds of 
0.3, 0.7, 1.1 and 1.5 nT/s. Here we will report only the 
results integrated, i.e., summed contingency table ele- 
ments, over all events. The summary results are integrated 
also separately over high-latitude (PBQ/SNK, ABK, YKC) 
and mid-latitude stations (WNG, NEW, OTT). Figures 5 
and 6 show the corresponding POD, POFD, and HSS for 
all participating models. 

[ 46 ] The focus of this paper is to report on the pro- 
cess, metrics, and initial results from the evaluation of 
physics-based and empirical models that predict regional 
ground-based dBldt variations during strong geomagnetic 
activity. Future work is needed to understand where 
model improvements are needed to better represent 
observations. At this stage of the work, it is important 
to quantify model capabilities and to provide informa- 
tion that will be used to assess whether or not these 
models provide useful guidance for improved forecasts 
of regional ground-based magnetic field perturbations. 
While refraining from further interpretations due to the 
nature of the paper, it is quite clear that for a given set 
of stations, events, and metrics, the model 9_SWMF pro- 
vides the highest POD and HSS for most of the thresholds. 
As an indication that large dBldt events are still a chal- 
lenge to capture accurately, for threshold 1.5 nT/s, none 
of the models is capable of providing POD or HSS greater 
than 0.5. 

[ 47 ] We emphasize that for optimal statistics the sum- 
mary results reported here are obtained by integrating 
over selected stations and all events. The results and 
the ranking of the models vary from station to station 
and event to event (see Figure 7). Further, event 1 domi- 
nates the statistics for larger dBldt thresholds due to the 
strength of the Halloween storm event. The breakdown 
of the results (contingency tables) for individual events 
are available in the supporting information and should be 
used for any further interpretations. 

6. Discussion 

[ 48 ] In this work, coordination among the CCMC, 
NOAA SWPC, modelers, and science community has 
resulted in the evaluation of several geospace models 
capable of predicting the fluctuation of the ground mag- 
netic field. The work was a continuation of earlier GEM 
modeling challenges and was designed to support model 
transition into operations at NOAA SWPC. The primary 
NOAA interest in this specific effort was to study models' 
capability to reproduce the observed dBldt, which can be 
used as an indicator for GIC activity. 

[ 49 ] The supporting information by Rastatter et al. 
(submitted manuscript, 2013) describes the development 


of the global (integration over global MHD, gap region, 
and the ionosphere) Biot-Savart integration tool used in 
the activity and CCMC's online interface that can be 
used for further station by station and event by event 
science analyses. We demonstrated here how the online 
tool can be used to study, for example, models' capa- 
bility to capture substorm-related ground magnetic field 
signatures. 

[ 50 ] We reported here the metrics results integrated 
over selected stations and all events. Model 9_SWMF 
provided the highest POD and HSS for all dBldt thresh- 
olds used to build the event detection-based contin- 
gency tables. However, we emphasize again that the 
metrics results vary from station to station and event to 
event. One should thus be cautious in making general 
interpretations without studying the more detailed break- 
down of the analysis. For this purpose, along with CCMC's 
online analysis interface, contingency tables for each indi- 
vidual event are available in the supporting information of 
this paper. 

[ 51 ] Finally, the key question is "are the models good 
enough to provide tangible value for the end user in need 
to mitigate GIC?" This is a multifaceted complex question 
and the answer most likely varies from user to user. Based 
on the summary results for POD, POFD, and HSS with the 
dBldt threshold of 1.5 nT/s, it is clear that predicting large 
dBldt is still a challenge. For example, if one picks, say, 50% 
chance of detecting an event as the threshold (this thresh- 
old will be user and application dependent) for identifying 
a good and not good enough prediction, one can see 
that POD and HSS were below 0.5 for all models for 
the dBldt threshold of 1.5 nT/s. Users requiring localized 
predictions for large dBldt with high likelihood of event 
detection may not be satisfied with the current state of 
the art. However, we saw that models have the capability 
to capture the general level of enhanced activity. Conse- 
quently, users satisfied with more rough characterization 
of dBldt activation over the storm periods may be able to 
use the models for generating actionable information. 

[ 52 ] The models validated in this paper can pro- 
vide short lead-time dBldt predictions. The meaning of 
"short" will vary as a function of the speed of transient 
structures in the solar wind and the computational capac- 
ity available for model execution. Lead times of 15-30 
min at best can be expected for fast coronal mass ejec- 
tion events. Obviously, continuous high-quality upstream 
solar wind plasma and magnetic field monitoring used 
to drive the models are also required. It is important 
to acknowledge that while providing 24/7 data stream, 
ACE SWEPAM plasma experiment has limitations dur- 
ing strong solar energetic particle events [Skoug et al, 
2004] often associated with major Earth-directed coro- 
nal mass ejections. The Deep Space Climate Observatory 
(DSCOVR) mission that will replace ACE as the primary 
upstream monitor is expected to launch in 2014. 

[ 53 ] Finally, one of the results of this effort to evaluate 
geospace models for transition from the research environ- 
ment to operations is that it has accelerated the delivery 
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of new versions of models to the CCMC for use by the 
science community. It has also resulted in the rigorous 
validation of models and initiated feedback from the oper- 
ations to research that will ultimately result in a better 
understanding of where model improvements are most 
needed. 
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